Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add option to use DST structs for flex arrays #2772

Merged
merged 10 commits into from
Apr 1, 2024

Conversation

jsgf
Copy link
Contributor

@jsgf jsgf commented Feb 26, 2024

Add --flexarray-dst option to allow use of a dynamically sized struct to model a C flexible array member. For example, given the C struct:

struct msg {
    unsigned int tag;
    unsigned int len;
    char payload[];
};

generate the corresponding Rust type:

#[repr(C)]
#[derive(Debug)]
pub struct msg<FAM: ?Sized = [::std::os::raw::c_char; 0]> {
    pub tag: ::std::os::raw::c_uint,
    pub len: ::std::os::raw::c_uint,
    pub payload: FAM,
}

This single definition is used for both Sized (with a zero-length array as the FAM type) and DST (with a slice).

This also generates the helper methods to convert a raw pointer into a Rust view of the C type, with a given length.

// Sized implementations
impl msg<[::std::os::raw::c_char; 0]> {
    // Construct a DST reference from a Sized object.
    // SAFETY: there must be at least `len` initialized elements in the underlying storage
    pub unsafe fn flex_ref(
        &self,
        len: usize,
    ) -> &msg<[::std::os::raw::c_char]> {
        unsafe { Self::flex_ptr(self, len) }
    }

    // Same, but mutable
    pub unsafe fn flex_mut_ref(
        &mut self,
        len: usize,
    ) -> &mut msg<[::std::os::raw::c_char]> {
        unsafe { Self::flex_ptr_mut(self, len).assume_init() }
    }

    // Raw pointer variants
    pub unsafe fn flex_ptr<'unbounded>(
        ptr: *const Self,
        len: usize,
    ) -> &'unbounded msg<[::std::os::raw::c_char]> {
        unsafe { &*::std::ptr::from_raw_parts(ptr as *const (), len) }
    }

    pub unsafe fn flex_ptr_mut<'unbounded>(
        ptr: *mut Self,
        len: usize,
    ) -> ::std::mem::MaybeUninit<&'unbounded mut msg<[::std::os::raw::c_char]>>
    {
        unsafe {
            let mut uninit = ::std::mem::MaybeUninit::<
                &mut msg<[::std::os::raw::c_char]>,
            >::uninit();
            (uninit.as_mut_ptr() as *mut *mut msg<[::std::os::raw::c_char]>)
                .write(::std::ptr::from_raw_parts_mut(ptr as *mut (), len));
            uninit
        }
    }
}

// DST implementations
impl msg<[::std::os::raw::c_char]> {
    // Memory size & alignment for allocation
    pub fn layout(len: usize) -> ::std::alloc::Layout {
        unsafe {
            let p: *const Self =
                ::std::ptr::from_raw_parts(::std::ptr::null(), len);
            ::std::alloc::Layout::for_value_raw(p)
        }
    }
    // return the Sized variant along with the length
    pub fn fixed(&self) -> (&msg<[::std::os::raw::c_char; 0]>, usize) {
        unsafe {
            let (ptr, len) = (self as *const Self).to_raw_parts();
            (&*(ptr as *const msg<[::std::os::raw::c_char; 0]>), len)
        }
    }
    pub fn fixed_mut(
        &mut self,
    ) -> (&mut msg<[::std::os::raw::c_char; 0]>, usize) {
        unsafe {
            let (ptr, len) = (self as *mut Self).to_raw_parts();
            (&mut *(ptr as *mut msg<[::std::os::raw::c_char; 0]>), len)
        }
    }
}

Upside:

  • The flexible array member is directly expressed in a fairly conventional Rust type, with no need to interact with the bindgen-specific __IncompleteArrayField type
    • In particular, normal size_of will work, taking into account the dynamically sized extension

Problems/TODO:

  • depends on unstable ptr_metadata and layout_for_ptr features
  • from_ptr(_mut) return references with unbounded lifetimes - it would be up to the caller to constrain them to the underlying storage's lifetime
    • I experimented with a variant which also takes a "lifetime witness" parameter _lt: &'a LT which can be used to bound the returned lifetime, but I'll need to try it in practice to see if it works.

This is a prototype for #2771

@jsgf
Copy link
Contributor Author

jsgf commented Feb 26, 2024

Not sure why, but when I run cargo fmt locally it wants to change things all over the place.
Oh, cargo +nightly fmt

@jsgf jsgf force-pushed the flexarray-dst branch 2 times, most recently from 1f9175c to a56d13d Compare February 26, 2024 00:39
@jsgf
Copy link
Contributor Author

jsgf commented Feb 26, 2024

This doesn't handle the case of a C struct pointing to a DST object (ie thin vs fat pointers).

@jsgf jsgf force-pushed the flexarray-dst branch 5 times, most recently from 187c2e3 to 31b4fc8 Compare March 5, 2024 00:42
@jsgf jsgf marked this pull request as ready for review March 6, 2024 21:38
@jsgf
Copy link
Contributor Author

jsgf commented Mar 6, 2024

I've updated the API to:

  1. generate struct MyStruct<FAM: ?Sized = [u8; 0]> - ie add a type parameter for the flexible array member, which can be dynamically sized but by default is a zero-sized array to make the structure sized. This means that all other references to this type are to the sized variant (without parameter qualification) which makes it equivalent to using __IncompleteArrayField.
  2. Added flex_ref, flex_ref_mut, flex_ptr, flex_ptr_mut to convert a sized reference/pointer to an unsized one. These are unsafe, because you must provide the size of the flexible array correctly.
  3. The DST variant, struct MyStruct<[u8]> has methods fixed, fixed_mut to return the sized structure along with the length. It also has layout which returns the size & alignment needed for a memory allocation.

I've been using this API for one of my own projects, and so far it feels pretty comfortable.

I also changed the generation of PhantomData fields for type parameters to put them at the start of the structure. This was necessary as putting them at the end interfered with the FAM field which must be last. The side-effect of this is a lot of test fixtures changed in a trivial way.

@@ -1734,6 +1747,7 @@ impl<'a> FieldCodegen<'a> for BitfieldUnit {
accessor_kind,
parent,
parent_item,
idx == bfields.len() - 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, shouldn't this be: last_field && idx == ..? Otherwise, what guarantees that there isn't a non-bitfield field after the bitfields?

};

// XXX correct with padding on end?
match fields.last() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can simplify by using fields.last()? right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant that something like this:

diff --git a/bindgen/ir/comp.rs b/bindgen/ir/comp.rs
index 31c058ea..f6c9e629 100644
--- a/bindgen/ir/comp.rs
+++ b/bindgen/ir/comp.rs
@@ -834,9 +834,9 @@ impl CompFields {
             CompFields::Error => return None, // panic?
         };

-        match fields.last() {
-            None | Some(Field::Bitfields(..)) => None,
-            Some(Field::DataMember(FieldData { ty, .. })) => ctx
+        match fields.last()? {
+            Field::Bitfields(..) => None,
+            Field::DataMember(FieldData { ty, .. }) => ctx
                 .resolve_type(*ty)
                 .is_incomplete_array(ctx)
                 .map(|item| item.expect_type_id(ctx)),

Should compile and work the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I parsed that ? as a question rather than an operator.

fn clone(&self) -> Self { *self }
}
});
}

if needs_flexarray_impl {
let prefix = ctx.trait_prefix();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this out to its own function or such? I feel like that'd make it easier to reason about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, happy to do that. But the rest of the function is so enormous - do you think it should all be factored down?

@@ -9,10 +9,10 @@ pub type DoesNotUse_Typedefed<U> = U;
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct DoesNotUse_IndirectUsage<T, U> {
pub member: DoesNotUse_Aliased<T>,
pub another: DoesNotUse_Typedefed<U>,
pub _phantom_0: ::std::marker::PhantomData<::std::cell::UnsafeCell<T>>,
pub _phantom_1: ::std::marker::PhantomData<::std::cell::UnsafeCell<U>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you send the "move phantom fields to the beginning" as a separate PR? That's fine on its own, and would make this PR a lot easier to review (GitHub's UI for this is not my favourite).

Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsgf
Copy link
Contributor Author

jsgf commented Mar 19, 2024

OK updated with comments (and still on top of #2783)

@bors-servo
Copy link

☔ The latest upstream changes (presumably 3b5ce9c) made this pull request unmergeable. Please resolve the merge conflicts.

Copy link
Contributor

@emilio emilio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pvdrz do you recall if we have a way of annotating a feature flag as experimental? This is mostly usable only with nightly rust...

But anyways this looks good to me in principle. Need to do a deeper review of the implementation but didn't look bad on my previous skim :)

};

// XXX correct with padding on end?
match fields.last() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant that something like this:

diff --git a/bindgen/ir/comp.rs b/bindgen/ir/comp.rs
index 31c058ea..f6c9e629 100644
--- a/bindgen/ir/comp.rs
+++ b/bindgen/ir/comp.rs
@@ -834,9 +834,9 @@ impl CompFields {
             CompFields::Error => return None, // panic?
         };

-        match fields.last() {
-            None | Some(Field::Bitfields(..)) => None,
-            Some(Field::DataMember(FieldData { ty, .. })) => ctx
+        match fields.last()? {
+            Field::Bitfields(..) => None,
+            Field::DataMember(FieldData { ty, .. }) => ctx
                 .resolve_type(*ty)
                 .is_incomplete_array(ctx)
                 .map(|item| item.expect_type_id(ctx)),

Should compile and work the same.

@pvdrz
Copy link
Contributor

pvdrz commented Mar 20, 2024

yeah we literally have the experimental flag on the CLI and a feature as well. I think other features are gated behind it already.

@jsgf
Copy link
Contributor Author

jsgf commented Mar 20, 2024

Does --experimental do anything? I couldn't see anywhere that referenced it.

@jsgf
Copy link
Contributor Author

jsgf commented Mar 25, 2024

@emilio @pvdrz Ping? Are there any outstanding concerns about this?

@jsgf
Copy link
Contributor Author

jsgf commented Mar 27, 2024

Fixed:

  • use the unsafe wrapping machinery
  • the consistency of the flex_mut_ref vs flex_ptr_mut names (went with trailing _mut)
  • inlined the helpers since they should have effectively no runtime implementation (they're really typesystem operations).

@jsgf
Copy link
Contributor Author

jsgf commented Apr 1, 2024

@emilio @pvdrz Ping?

Copy link
Contributor

@emilio emilio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks fine. Sorry, last week has been short due to easter holidays :)

@emilio emilio enabled auto-merge (rebase) April 1, 2024 19:53
auto-merge was automatically disabled April 1, 2024 20:21

Merge queue setting changed

@bors-servo
Copy link

☔ The latest upstream changes (presumably fb39a30) made this pull request unmergeable. Please resolve the merge conflicts.

@emilio
Copy link
Contributor

emilio commented Apr 1, 2024

Please rebase and ping, happy to merge. Hopefully it's just a matter of regenerating test expectations. Sorry, this didn't get into the merge queue because it got configured right after I pressed the "Merge when ready" button :/

@jsgf
Copy link
Contributor Author

jsgf commented Apr 1, 2024

Do you want me to squash the commits?

jsgf and others added 10 commits April 1, 2024 15:11
This option uses Rust DST types to model C structures with flexible
array members.

For example, if you declare:
```c
struct record {
    int len;
    float values[];
}
```
this means it's a C structure with a well-defined prefix, but a tail of
`values` which depends on how much memory we've allocated for it.

We can model this in Rust with:
```rust
struct record {
    len: c_int,
    values: [f32],
}
```
which means more or less the same thing - there's a type which has a
known prefix, but its suffix is not directly known.
There are some tests which have two incomplete array fields; only the
very last one is valid as a flexible array member (in C) or can be a DST
field in Rust.
Put the pointer converting methods on the sized prefix types ([T; 0])
since they're the default type and what you're likely to be starting
with. Emphasize the ones which work on references since they have safe
lifetimes, but also have raw pointer variants, esp for handling
uninitialized cases.

The flex types have methods which return the sized type along with the
length.
Previously it was generating inner `unsafe` blocks for all
unsafe functions in conformance with Rust 2024, but now only do it when
`--wrap-unsafe-ops` is enabled for consistency with other generated
code.

Also rename `flex_mut_ref` -> `flex_ref_mut` to make it consistent with
`flex_ptr_mut` and general Rust convention.
The conversions between fixed and dynamically sized forms are essentially
type-level transforms which should have trivial implementations in terms
of generated code, so there's no reason not to make them inline.
@emilio
Copy link
Contributor

emilio commented Apr 1, 2024

Do you want me to squash the commits?

Nah, all good, thanks!

@emilio emilio enabled auto-merge April 1, 2024 22:14
@emilio emilio added this pull request to the merge queue Apr 1, 2024
Merged via the queue into rust-lang:main with commit 7e90434 Apr 1, 2024
33 checks passed
facebook-github-bot pushed a commit to facebook/sapling that referenced this pull request Apr 2, 2024
Summary:
[PR 2772](rust-lang/rust-bindgen#2772) adds the --flexarray-dst option to handle C Flexible Array Members as Rust dynamically sized types. Still pending review, but discussion has been positive.

Full details:

Add `--flexarray-dst` option to allow use of a dynamically sized struct to model a C flexible array member. For example, given the C struct:
```
struct msg {
    unsigned int tag;
    unsigned int len;
    char payload[];
};
```
generate the corresponding Rust type:
```rust
#[repr(C)]
#[derive(Debug)]
pub struct msg<FAM: ?Sized = [::std::os::raw::c_char; 0]> {
    pub tag: ::std::os::raw::c_uint,
    pub len: ::std::os::raw::c_uint,
    pub payload: FAM,
}
```
This single definition is used for both Sized (with a zero-length array as the FAM type) and DST (with a slice).

This also generates the helper methods to convert a raw pointer into a Rust view of the C type, with a given length.
```rust
// Sized implementations
impl msg<[::std::os::raw::c_char; 0]> {
    // Construct a DST reference from a Sized object.
    // SAFETY: there must be at least `len` initialized elements in the underlying storage
    pub unsafe fn flex_ref(
        &self,
        len: usize,
    ) -> &msg<[::std::os::raw::c_char]> {
        unsafe { Self::flex_ptr(self, len) }
    }

    // Same, but mutable
    pub unsafe fn flex_mut_ref(
        &mut self,
        len: usize,
    ) -> &mut msg<[::std::os::raw::c_char]> {
        unsafe { Self::flex_ptr_mut(self, len).assume_init() }
    }

    // Raw pointer variants
    pub unsafe fn flex_ptr<'unbounded>(
        ptr: *const Self,
        len: usize,
    ) -> &'unbounded msg<[::std::os::raw::c_char]> {
        unsafe { &*::std::ptr::from_raw_parts(ptr as *const (), len) }
    }

    pub unsafe fn flex_ptr_mut<'unbounded>(
        ptr: *mut Self,
        len: usize,
    ) -> ::std::mem::MaybeUninit<&'unbounded mut msg<[::std::os::raw::c_char]>>
    {
        unsafe {
            let mut uninit = ::std::mem::MaybeUninit::<
                &mut msg<[::std::os::raw::c_char]>,
            >::uninit();
            (uninit.as_mut_ptr() as *mut *mut msg<[::std::os::raw::c_char]>)
                .write(::std::ptr::from_raw_parts_mut(ptr as *mut (), len));
            uninit
        }
    }
}

// DST implementations
impl msg<[::std::os::raw::c_char]> {
    // Memory size & alignment for allocation
    pub fn layout(len: usize) -> ::std::alloc::Layout {
        unsafe {
            let p: *const Self =
                ::std::ptr::from_raw_parts(::std::ptr::null(), len);
            ::std::alloc::Layout::for_value_raw(p)
        }
    }
    // return the Sized variant along with the length
    pub fn fixed(&self) -> (&msg<[::std::os::raw::c_char; 0]>, usize) {
        unsafe {
            let (ptr, len) = (self as *const Self).to_raw_parts();
            (&*(ptr as *const msg<[::std::os::raw::c_char; 0]>), len)
        }
    }
    pub fn fixed_mut(
        &mut self,
    ) -> (&mut msg<[::std::os::raw::c_char; 0]>, usize) {
        unsafe {
            let (ptr, len) = (self as *mut Self).to_raw_parts();
            (&mut *(ptr as *mut msg<[::std::os::raw::c_char; 0]>), len)
        }
    }
}
```

Upside:
- The flexible array member is directly expressed in a fairly conventional Rust type, with no need to interact with the bindgen-specific `__IncompleteArrayField` type
  - In particular, normal `size_of` will work, taking into account the dynamically sized extension

Problems/TODO:
- depends on unstable `ptr_metadata` and `layout_for_ptr` features
- `from_ptr(_mut)` return references with unbounded lifetimes - it would be up to the caller to constrain them to the underlying storage's lifetime
  - I experimented with a variant which also takes a "lifetime witness" parameter `_lt: &'a LT` which can be used to bound the returned lifetime, but I'll need to try it in practice to see if it works.

This is a prototype for #2771

Reviewed By: dtolnay

Differential Revision: D54605921

fbshipit-source-id: 929ec448b758975ce736c21fde6b805e3d297bb7
facebook-github-bot pushed a commit to facebookexperimental/rust-shed that referenced this pull request Apr 2, 2024
Summary:
[PR 2772](rust-lang/rust-bindgen#2772) adds the --flexarray-dst option to handle C Flexible Array Members as Rust dynamically sized types. Still pending review, but discussion has been positive.

Full details:

Add `--flexarray-dst` option to allow use of a dynamically sized struct to model a C flexible array member. For example, given the C struct:
```
struct msg {
    unsigned int tag;
    unsigned int len;
    char payload[];
};
```
generate the corresponding Rust type:
```rust
#[repr(C)]
#[derive(Debug)]
pub struct msg<FAM: ?Sized = [::std::os::raw::c_char; 0]> {
    pub tag: ::std::os::raw::c_uint,
    pub len: ::std::os::raw::c_uint,
    pub payload: FAM,
}
```
This single definition is used for both Sized (with a zero-length array as the FAM type) and DST (with a slice).

This also generates the helper methods to convert a raw pointer into a Rust view of the C type, with a given length.
```rust
// Sized implementations
impl msg<[::std::os::raw::c_char; 0]> {
    // Construct a DST reference from a Sized object.
    // SAFETY: there must be at least `len` initialized elements in the underlying storage
    pub unsafe fn flex_ref(
        &self,
        len: usize,
    ) -> &msg<[::std::os::raw::c_char]> {
        unsafe { Self::flex_ptr(self, len) }
    }

    // Same, but mutable
    pub unsafe fn flex_mut_ref(
        &mut self,
        len: usize,
    ) -> &mut msg<[::std::os::raw::c_char]> {
        unsafe { Self::flex_ptr_mut(self, len).assume_init() }
    }

    // Raw pointer variants
    pub unsafe fn flex_ptr<'unbounded>(
        ptr: *const Self,
        len: usize,
    ) -> &'unbounded msg<[::std::os::raw::c_char]> {
        unsafe { &*::std::ptr::from_raw_parts(ptr as *const (), len) }
    }

    pub unsafe fn flex_ptr_mut<'unbounded>(
        ptr: *mut Self,
        len: usize,
    ) -> ::std::mem::MaybeUninit<&'unbounded mut msg<[::std::os::raw::c_char]>>
    {
        unsafe {
            let mut uninit = ::std::mem::MaybeUninit::<
                &mut msg<[::std::os::raw::c_char]>,
            >::uninit();
            (uninit.as_mut_ptr() as *mut *mut msg<[::std::os::raw::c_char]>)
                .write(::std::ptr::from_raw_parts_mut(ptr as *mut (), len));
            uninit
        }
    }
}

// DST implementations
impl msg<[::std::os::raw::c_char]> {
    // Memory size & alignment for allocation
    pub fn layout(len: usize) -> ::std::alloc::Layout {
        unsafe {
            let p: *const Self =
                ::std::ptr::from_raw_parts(::std::ptr::null(), len);
            ::std::alloc::Layout::for_value_raw(p)
        }
    }
    // return the Sized variant along with the length
    pub fn fixed(&self) -> (&msg<[::std::os::raw::c_char; 0]>, usize) {
        unsafe {
            let (ptr, len) = (self as *const Self).to_raw_parts();
            (&*(ptr as *const msg<[::std::os::raw::c_char; 0]>), len)
        }
    }
    pub fn fixed_mut(
        &mut self,
    ) -> (&mut msg<[::std::os::raw::c_char; 0]>, usize) {
        unsafe {
            let (ptr, len) = (self as *mut Self).to_raw_parts();
            (&mut *(ptr as *mut msg<[::std::os::raw::c_char; 0]>), len)
        }
    }
}
```

Upside:
- The flexible array member is directly expressed in a fairly conventional Rust type, with no need to interact with the bindgen-specific `__IncompleteArrayField` type
  - In particular, normal `size_of` will work, taking into account the dynamically sized extension

Problems/TODO:
- depends on unstable `ptr_metadata` and `layout_for_ptr` features
- `from_ptr(_mut)` return references with unbounded lifetimes - it would be up to the caller to constrain them to the underlying storage's lifetime
  - I experimented with a variant which also takes a "lifetime witness" parameter `_lt: &'a LT` which can be used to bound the returned lifetime, but I'll need to try it in practice to see if it works.

This is a prototype for #2771

Reviewed By: dtolnay

Differential Revision: D54605921

fbshipit-source-id: 929ec448b758975ce736c21fde6b805e3d297bb7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants